XArray expands on the capabilities on NumPy arrays, providing a lot of streamlined data manipulation. It is similar in that respect to Pandas, but whereas Pandas excels at working with tabular data, XArray is focused on N-dimensional arrays of data (i.e. grids). Its interface is based largely on the netCDF data model (variables, attributes, and dimensions), but it goes beyond the traditional netCDF interfaces to provide functionality similar to netCDF-java's Common Data Model (CDM).
DataArray
The DataArray
is one of the basic building blocks of XArray. It provides a NumPy ndarray-like object that expands to provide two critical pieces of functionality:
In [ ]:
# Convention for import to get shortened namespace
import numpy as np
import xarray as xr
In [ ]:
# Create some sample "temperature" data
data = 283 + 5 * np.random.randn(5, 3, 4)
data
Here we create a basic DataArray
by passing it just a numpy array of random data. Note that XArray generates some basic dimension names for us.
In [ ]:
temp = xr.DataArray(data)
temp
We can also pass in our own dimension names:
In [ ]:
temp = xr.DataArray(data, dims=['time', 'lat', 'lon'])
temp
This is already improved upon from a numpy array, because we have names for each of the dimensions (or axes in NumPy parlance). Even better, we can take arrays representing the values for the coordinates for each of these dimensions and associate them with the data when we create the DataArray
.
In [ ]:
# Use pandas to create an array of datetimes
import pandas as pd
times = pd.date_range('2018-01-01', periods=5)
times
In [ ]:
# Sample lon/lats
lons = np.linspace(-120, -60, 4)
lats = np.linspace(25, 55, 3)
When we create the DataArray
instance, we pass in the arrays we just created:
In [ ]:
temp = xr.DataArray(data, coords=[times, lats, lons], dims=['time', 'lat', 'lon'])
temp
...and we can also set some attribute metadata:
In [ ]:
temp.attrs['units'] = 'kelvin'
temp.attrs['standard_name'] = 'air_temperature'
temp
Notice what happens if we perform a mathematical operaton with the DataArray
: the coordinate values persist, but the attributes are lost. This is done because it is very challenging to know if the attribute metadata is still correct or appropriate after arbitrary arithmetic operations.
In [ ]:
# For example, convert Kelvin to Celsius
temp - 273.15
In [ ]:
temp.sel(time='2018-01-02')
.sel
has the flexibility to also perform nearest neighbor sampling, taking an optional tolerance:
In [ ]:
from datetime import timedelta
temp.sel(time='2018-01-07', method='nearest', tolerance=timedelta(days=2))
.interp()
works similarly to .sel()
. Using .interp()
, get an interpolated time series "forecast" for Boulder (40°N, 105°W) or your favorite latitude/longitude location. (Documentation for interp).
In [ ]:
# Your code goes here
In [ ]:
# %load solutions/interp_solution.py
In [ ]:
temp.sel(time=slice('2018-01-01', '2018-01-03'), lon=slice(-110, -70), lat=slice(25, 45))
In [ ]:
# As done above
temp.loc['2018-01-02']
In [ ]:
temp.loc['2018-01-01':'2018-01-03', 25:45, -110:-70]
In [ ]:
# This *doesn't* work however:
#temp.loc[-110:-70, 25:45,'2018-01-01':'2018-01-03']
In [ ]:
# Open sample North American Reanalysis data in netCDF format
ds = xr.open_dataset('../../data/NARR_19930313_0000.nc')
ds
This returns a Dataset
object, which is a container that contains one or more DataArray
s, which can also optionally share coordinates. We can then pull out individual fields:
In [ ]:
ds.isobaric1
or
In [ ]:
ds['isobaric1']
Dataset
s also support much of the same subsetting operations as DataArray
, but will perform the operation on all data:
In [ ]:
ds_1000 = ds.sel(isobaric1=1000.0)
ds_1000
In [ ]:
ds_1000.Temperature_isobaric
In [ ]:
u_winds = ds['u-component_of_wind_isobaric']
u_winds.std(dim=['x', 'y'])
Using the sample dataset, calculate the mean temperature profile (temperature as a function of pressure) over Colorado within this dataset. For this exercise, consider the bounds of Colorado to be:
(37°N to 41°N and 102°W to 109°W projected to Lambert Conformal projection coordinates)
In [ ]:
# %load solutions/mean_profile.py
There is much more in the XArray library. To learn more, visit the XArray Documentation